Goto

Collaborating Authors

 metric value


Hey GPT-OSS, Looks Like You Got It -- Now Walk Me Through It! An Assessment of the Reasoning Language Models Chain of Thought Mechanism for Digital Forensics

Michelet, Gaëtan, Schneider, Janine, Withanage, Aruna, Breitinger, Frank

arXiv.org Artificial Intelligence

The use of large language models in digital forensics has been widely explored. Beyond identifying potential applications, research has also focused on optimizing model performance for forensic tasks through fine-tuning. However, limited result explainability reduces their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic-based tasks through an `internal reasoning' mechanism. Yet, users typically see only the final answer, not the underlying reasoning. One of these reasoning models is gpt-oss, which can be deployed locally, providing full access to its underlying reasoning process. This article presents the first investigation into the potential of reasoning language models for digital forensics. Four test use cases are examined to assess the usability of the reasoning component in supporting result explainability. The evaluation combines a new quantitative metric with qualitative analysis. Findings show that the reasoning component aids in explaining and validating language model outputs in digital forensics at medium reasoning levels, but this support is often limited, and higher reasoning levels do not enhance response quality.


A Design space

Neural Information Processing Systems

Compute resources We trained the configurations on a large SLURM-based cluster with approximately 300,000 CPU-cores available in parallel. Data splits We split our performance dataset into a training, validation and test split in an approximately 70-15-15 ratio. In step 1, we treated every single configuration's data points across multiple epochs as time-series data, where each epoch is a single time step, thereby grouping together Adding bounds Since XGBoost is an unbounded regression model i.e. its codomain is This allows for a comprehensive analysis of optimizer's performance Dataset Average predicted runtime [CPU-d] CIFAR-10 2.0 Colorectal-Histology 0.2 Fashion-MNIST 2.2 This does not take into account carbon emissions for optimizing and training the surrogate benchmarks based on the data and indirect emissions such as creating the compute hardware and maintenance of the compute cluster. We noted that our surrogate models' performance on the Colorectal-Histology task was much worse In the first experiment, 20 configurations were randomly chosen from the set of configurations belonging to the highest fidelity group (N=5, W=16, R=1.0) that had already been evaluated on CIFAR-10 and Colorectal-Histology for our performance dataset and re-evaluated for 200 epochs using 2 different, randomly sampled sets of seeds for initialization. We present the results of this analysis in Table 11.




A Appendix

Neural Information Processing Systems

Following the methodology in R. Zhang et al. (2018), we did not select our final model based on a This process is illustrated in Figure 1. The first row shows that the ImageNet-C Gaussian noise corruptions (with standard deviations of 0.08, 0.12, 0.18, 0.26, 0.38) exactly aligns with our additive Gaussian noise for all four models, indicating that the evaluation was correctly calibrated. The abscissa reflects the scale of the corresponding metric; e.g., MS-SSIM ranges from 0 to 1, where 1 means the two images are identical, and 0 means the two The other three models are Euclidean distances (i.e., 0 means that More image examples are provided in Figs. Zoom corruptions discussed in Section 3.3.





A Proofs Proof of Proposition 1. For all x, y R

Neural Information Processing Systems

Thus, (14) is an equality, and u attains the maximum in (5), i.e., it is an optimal dual potential.Proof of Proposition 2. Proof of Proposition 3. We split the proof into 4 parts. Assume the contrary, i.e., that there exist m = m Thus, this case is not possible. Thus, the second case is also not possible. Proof of Proposition 4. We compute W In this section, we provide the details of the training of the OT solvers that we consider. In the images case, the batch size is 32.


An approach based on class activation maps for investigating the effects of data augmentation on neural networks for image classification

Dorneles, Lucas M., Garcia, Luan Fonseca, Carbonera, Joel Luís

arXiv.org Artificial Intelligence

Neural networks have become increasingly popular in the last few years as an effective tool for the task of image classification due to the impressive performance they have achieved on this task. In image classification tasks, it is common to use data augmentation strategies to increase the robustness of trained networks to changes in the input images and to avoid overfitting. Although data augmentation is a widely adopted technique, the literature lacks a body of research analyzing the effects data augmentation methods have on the patterns learned by neural network models working on complex datasets. The primary objective of this work is to propose a methodology and set of metrics that may allow a quantitative approach to analyzing the effects of data augmentation in convolutional networks applied to image classification. An important tool used in the proposed approach lies in the concept of class activation maps for said models, which allow us to identify and measure the importance these models assign to each individual pixel in an image when executing the classification task. From these maps, we may then extract metrics over the similarities and differences between maps generated by these models trained on a given dataset with different data augmentation strategies. Experiments made using this methodology suggest that the effects of these data augmentation techniques not only can be analyzed in this way but also allow us to identify different impact profiles over the trained models.